OcrV1, Main, Exploration, bibRecord, 000A82

Feature string-based intelligent information retrieval from Tamil document images

Identifieur interne : 000A82 ( Main/Exploration ); précédent : 000A81; suivant : 000A83

Feature string-based intelligent information retrieval from Tamil document images

Auteurs : S. Abirami [Inde] ; D. Manjula [Inde]

Source :

International journal of computer applications in technology [ 0952-8091 ] ; 2009.

RBID : Pascal:10-0181228

Descripteurs français

Pascal (Inist)
- Chaîne caractère, Recherche information, Recherche documentaire, Langage naturel, Texte, Reconnaissance image, Image optique, Reconnaissance optique caractère, Reconnaissance caractère, Analyse image, Traitement image, Bibliothèque électronique, Lettre alphabet, Procédé extraction, Mot clé, Extraction forme.
Wicri :
- topic : Recherche documentaire.

English descriptors

KwdEn :
- Character recognition, Character string, Document retrieval, Electronic library, Extraction process, Image analysis, Image processing, Image recognition, Information retrieval, Keyword, Letter, Natural language, Optical character recognition, Optical image, Pattern extraction, Text.

Abstract

Information Retrieval (IR) in document images has become a growing and challenging problem due to its rising popularity. This paper proposes a simple and effective method to extract the text and perform intelligent IR from Tamil Document Images without Optical Character Recognition (OCR). This methodology generates a feature string for every word image by extracting its features. This relies on their basic characteristics or shapes of letters instead of recognising the letters like OCR. The strength of this technique lies in extracting the text based on their basic features such as lines and black and white disposition rates in characters which is almost same for the characters across various font sizes and font faces. As an offline process, document images are preprocessed and text extraction process extracts the features from the word images based on their shapes and they are stored in temporary files. During online retrieval, textual keyword is obtained from the user and its primitive string is framed. Based on the primitive string, IR is performed and the resultant images are provided to the user. This technique could be easily adopted in large digital libraries for IR.

Affiliations:

Inde

Links toward previous steps (curation, corpus...)

to stream PascalFrancis, to step Corpus: 000192
to stream PascalFrancis, to step Curation: 000585
to stream PascalFrancis, to step Checkpoint: 000198
to stream Main, to step Merge: 000A92
to stream Main, to step Curation: 000A82

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en" level="a">Feature string-based intelligent information retrieval from Tamil document images</title>
<author><name sortKey="Abirami, S" sort="Abirami, S" uniqKey="Abirami S" first="S." last="Abirami">S. Abirami</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Department of Computer Science & Engineering, College of Engineering, Anna University</s1>
<s2>Chennai 600025</s2>
<s3>IND</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Inde</country>
<wicri:noRegion>Chennai 600025</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Manjula, D" sort="Manjula, D" uniqKey="Manjula D" first="D." last="Manjula">D. Manjula</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Department of Computer Science & Engineering, College of Engineering, Anna University</s1>
<s2>Chennai 600025</s2>
<s3>IND</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Inde</country>
<wicri:noRegion>Chennai 600025</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">INIST</idno>
<idno type="inist">10-0181228</idno>
<date when="2009">2009</date>
<idno type="stanalyst">PASCAL 10-0181228 INIST</idno>
<idno type="RBID">Pascal:10-0181228</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000192</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000585</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000198</idno>
<idno type="wicri:doubleKey">0952-8091:2009:Abirami S:feature:string:based</idno>
<idno type="wicri:Area/Main/Merge">000A92</idno>
<idno type="wicri:Area/Main/Curation">000A82</idno>
<idno type="wicri:Area/Main/Exploration">000A82</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a">Feature string-based intelligent information retrieval from Tamil document images</title>
<author><name sortKey="Abirami, S" sort="Abirami, S" uniqKey="Abirami S" first="S." last="Abirami">S. Abirami</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Department of Computer Science & Engineering, College of Engineering, Anna University</s1>
<s2>Chennai 600025</s2>
<s3>IND</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Inde</country>
<wicri:noRegion>Chennai 600025</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Manjula, D" sort="Manjula, D" uniqKey="Manjula D" first="D." last="Manjula">D. Manjula</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Department of Computer Science & Engineering, College of Engineering, Anna University</s1>
<s2>Chennai 600025</s2>
<s3>IND</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Inde</country>
<wicri:noRegion>Chennai 600025</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series><title level="j" type="main">International journal of computer applications in technology</title>
<title level="j" type="abbreviated">Int. j. comput. appl. technol.</title>
<idno type="ISSN">0952-8091</idno>
<imprint><date when="2009">2009</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><title level="j" type="main">International journal of computer applications in technology</title>
<title level="j" type="abbreviated">Int. j. comput. appl. technol.</title>
<idno type="ISSN">0952-8091</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Character recognition</term>
<term>Character string</term>
<term>Document retrieval</term>
<term>Electronic library</term>
<term>Extraction process</term>
<term>Image analysis</term>
<term>Image processing</term>
<term>Image recognition</term>
<term>Information retrieval</term>
<term>Keyword</term>
<term>Letter</term>
<term>Natural language</term>
<term>Optical character recognition</term>
<term>Optical image</term>
<term>Pattern extraction</term>
<term>Text</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Chaîne caractère</term>
<term>Recherche information</term>
<term>Recherche documentaire</term>
<term>Langage naturel</term>
<term>Texte</term>
<term>Reconnaissance image</term>
<term>Image optique</term>
<term>Reconnaissance optique caractère</term>
<term>Reconnaissance caractère</term>
<term>Analyse image</term>
<term>Traitement image</term>
<term>Bibliothèque électronique</term>
<term>Lettre alphabet</term>
<term>Procédé extraction</term>
<term>Mot clé</term>
<term>Extraction forme</term>
</keywords>
<keywords scheme="Wicri" type="topic" xml:lang="fr"><term>Recherche documentaire</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Information Retrieval (IR) in document images has become a growing and challenging problem due to its rising popularity. This paper proposes a simple and effective method to extract the text and perform intelligent IR from Tamil Document Images without Optical Character Recognition (OCR). This methodology generates a feature string for every word image by extracting its features. This relies on their basic characteristics or shapes of letters instead of recognising the letters like OCR. The strength of this technique lies in extracting the text based on their basic features such as lines and black and white disposition rates in characters which is almost same for the characters across various font sizes and font faces. As an offline process, document images are preprocessed and text extraction process extracts the features from the word images based on their shapes and they are stored in temporary files. During online retrieval, textual keyword is obtained from the user and its primitive string is framed. Based on the primitive string, IR is performed and the resultant images are provided to the user. This technique could be easily adopted in large digital libraries for IR.</div>
</front>
</TEI>
<affiliations><list><country><li>Inde</li>
</country>
</list>
<tree><country name="Inde"><noRegion><name sortKey="Abirami, S" sort="Abirami, S" uniqKey="Abirami S" first="S." last="Abirami">S. Abirami</name>
</noRegion>
<name sortKey="Manjula, D" sort="Manjula, D" uniqKey="Manjula D" first="D." last="Manjula">D. Manjula</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000A82 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000A82 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Pascal:10-0181228
   |texte=   Feature string-based intelligent information retrieval from Tamil document images
}}

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024

	Serveur d'exploration sur l'OCR
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur l'OCR

Feature string-based intelligent information retrieval from Tamil document images

Feature string-based intelligent information retrieval from Tamil document images

Source :

Descripteurs français

English descriptors

Abstract

Links toward previous steps (curation, corpus...)

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri